Search CORE

161 research outputs found

Optimal Order and Efficiency for Iterations with Two Evaluations

Author: Kung H.T.
Traub Josephe F.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1976
Field of study

The problem is to calculate a simple zero of a nonlinear function f. We consider rational iterations without memory which use two evaluations of f or its derivatives. It is shown that the optimal order is 2. This settles a conjecture of Kung and Traub that an iteration using n evaluations without memory is of order at most 2ⁿ⁻¹, for the case n=2. Furthermore we show that any rational two-evaluation iteration of optimal order must use either two evaluations of f or one evaluation of f and one of f'. From this result we completely settle the question of the optimal efficiency, in our efficiency measure, for any two-evaluation iteration without memory. Depending on the relative cost of evaluating f and f', the optimal efficiency is achieved by either Newton iteration or the iteration ᴪ

CiteSeerX

Columbia University Academic Commons

Performance Gains in Conjugate Gradient Computation with Linearly Connected GPU Multiprocessors

Author: Kung H.T.
Lin Tsung-Han
Tarsa Stephen John
Publication venue: USENIX Association
Publication date: 06/03/2014
Field of study

Conjugate gradient is an important iterative method used for solving least squares problems. It is compute-bound and generally involves only simple matrix computations. One would expect that we could fully parallelize such computation on the GPU architecture with multiple Stream Multiprocessors (SMs), each consisting of many SIMD processing units. While implementing a conjugate gradient method for compressive sensing signal reconstruction, we have noticed that large speed-up due to parallel processing is actually infeasible due to the high I/O cost between SMs and GPU global memory. WE have found that if SMs were linearly connected, we could gain a 15x speedup by loop unrolling. We conclude that adding these relatively inexpensive neighbor connections for SMs can significantly enhance the applicability of GPUs to a large class of similar matrix computations.Engineering and Applied Science

CiteSeerX

Harvard University - DASH

Concurrent manipulation of binary search trees

Author: GRAY J
GU BAS
H. T. Kung
KNUTH D.E.
KUNC H.T.
KUNG H.T.
MILLER R.E.
Philip L. Lehman
SAMA
SWAN R.J.
WEDEKIND S.
WULF W.A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Multipoint efficient iterative methods and the dynamics of Ostrowski's method

Author: Amat S.
Carles Teruel
Eulalia Martínez
Grau M.
Hu Z.
José L. Hueso
Kung H.T.
Magreñan S.
Ostrowski A.M.
Traub J.F.
Publication venue: 'Informa UK Limited'
Publication date: 02/09/2019
Field of study

This is an Author's Accepted Manuscript of an article published in José L. Hueso, Eulalia Martínez & Carles Teruel (2019) Multipoint efficient iterative methods and the dynamics of Ostrowski's method, International Journal of Computer Mathematics, 96:9, 1687-1701, DOI: 10.1080/00207160.2015.1080354 in the International Journal of Computer Mathematics, SEP 2 2019 [copyright Taylor & Francis], available online at: http://www.tandfonline.com/10.1080/00207160.2015.1080354[EN] In this work, we introduce a modification into the technique, presented in A. Cordero, J.L. Hueso, E. Martinez, and J.R. Torregrosa [Increasing the convergence order of an iterative method for nonlinear systems, Appl. Math. Lett. 25 (2012), pp. 2369-2374], that increases by two units the convergence order of an iterative method. The main idea is to compose a given iterative method of order p with a modification of Newton's method that introduces just one evaluation of the function, obtaining a new method of order p+2, avoiding the need to compute more than one derivative, so we improve the efficiency index in the scalar case. This procedure can be repeated n times, with the same approximation to the derivative, obtaining new iterative methods of order p+2n. We perform different numerical tests that confirm the theoretical results. By applying this procedure to Newton's method one obtains the well known fourth order Ostrowski's method. We finally analyse its dynamical behaviour on second and third degree real polynomials.This research was supported by Ministerio de Economia y Competitividad under grant PGC2018-095896-B-C22 and by the project of Generalitat Valenciana Prometeo/2016/089.Hueso, JL.; Martínez Molada, E.; Teruel-Ferragud, C. (2019). Multipoint efficient iterative methods and the dynamics of Ostrowski's method. International Journal of Computer Mathematics. 96(9):1687-1701. https://doi.org/10.1080/00207160.2015.1080354S16871701969Amat, S., Busquier, S., & Plaza, S. (2010). Chaotic dynamics of a third-order Newton-type method. Journal of Mathematical Analysis and Applications, 366(1), 24-32. doi:10.1016/j.jmaa.2010.01.047Cordero, A., & Torregrosa, J. R. (2007). Variants of Newton’s Method using fifth-order quadrature formulas. Applied Mathematics and Computation, 190(1), 686-698. doi:10.1016/j.amc.2007.01.062Cordero, A., Martínez, E., & Torregrosa, J. R. (2009). Iterative methods of order four and five for systems of nonlinear equations. Journal of Computational and Applied Mathematics, 231(2), 541-551. doi:10.1016/j.cam.2009.04.015Cordero, A., Hueso, J. L., Martínez, E., & Torregrosa, J. R. (2012). Increasing the convergence order of an iterative method for nonlinear systems. Applied Mathematics Letters, 25(12), 2369-2374. doi:10.1016/j.aml.2012.07.005Jarratt, P. (1966). Some fourth order multipoint iterative methods for solving equations. Mathematics of Computation, 20(95), 434-434. doi:10.1090/s0025-5718-66-99924-

Crossref

RiuNet

Fast computation of Bernoulli, Tangent and Secant numbers

Author: A. Schönhage
D. Harvey
D.E. Knuth
D.E. Knuth
D.H. Bailey
H.R.P. Ferguson
H.T. Kung
J. Buhler
J. Buhler
J. Buhler
J. Buhler
J.M. Borwein
K. Hare
M. Abramowitz
M. Sieveking
M.D. Atkinson
R.E. Crandall
R.E. Crandall
R.L. Graham
T. Clausen
W. Bosma
Publication venue
Publication date: 05/09/2011
Field of study

We consider the computation of Bernoulli, Tangent (zag), and Secant (zig or Euler) numbers. In particular, we give asymptotically fast algorithms for computing the first n such numbers in O(n^2.(log n)^(2+o(1))) bit-operations. We also give very short in-place algorithms for computing the first n Tangent or Secant numbers in O(n^2) integer operations. These algorithms are extremely simple, and fast for moderate values of n. They are faster and use less space than the algorithms of Atkinson (for Tangent and Secant numbers) and Akiyama and Tanigawa (for Bernoulli numbers).Comment: 16 pages. To appear in Computational and Analytical Mathematics (associated with the May 2011 workshop in honour of Jonathan Borwein's 60th birthday). For further information, see http://maths.anu.edu.au/~brent/pub/pub242.htm

arXiv.org e-Print Archive

Crossref

The Area-Time Complexity of Binary Multiplication

Author: BRENT R P
BRENT R P
BRENT R.P.
GARNER H.L
H. T. Kung
JACKSON L.B.
KUCK D.J.
KUNG H.T.
LEISERSON C E
LINNIK UV
LYON R.F
OFMAN Y
R. P. Brent
ROSSER J B
THOMPSON C D
WALLACE C S
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

On the complexity of strongly connected components in directed hypergraphs

Author: A. Elmasry
A.V. Aho
C.C. Özturan
D. Pretolani
D. Pretolani
D.G. Kirkpatrick
D.M. Yellin
D.M. Yellin
G. Ausiello
G. Ausiello
G. Ausiello
G. Ausiello
G. Ausiello
G. Ausiello
G. Ausiello
G. Gallo
G. Gallo
G. Gallo
H.N. Gabow
H.T. Kung
J. Cheriyan
L.R. Nielsen
M. Thakur
P. Godfrey
P. Pritchard
P. Pritchard
P. Pritchard
P. Pritchard
R. Tarjan
R.D. Katz
R.J. Bayardo
S. Gaubert
S. Nguyen
S. Nguyen
T.H. Cormen
X. Allamigeon
X. Allamigeon
X. Liu
Xavier Allamigeon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/02/2013
Field of study

We study the complexity of some algorithmic problems on directed hypergraphs and their strongly connected components (SCCs). The main contribution is an almost linear time algorithm computing the terminal strongly connected components (i.e. SCCs which do not reach any components but themselves). "Almost linear" here means that the complexity of the algorithm is linear in the size of the hypergraph up to a factor alpha(n), where alpha is the inverse of Ackermann function, and n is the number of vertices. Our motivation to study this problem arises from a recent application of directed hypergraphs to computational tropical geometry. We also discuss the problem of computing all SCCs. We establish a superlinear lower bound on the size of the transitive reduction of the reachability relation in directed hypergraphs, showing that it is combinatorially more complex than in directed graphs. Besides, we prove a linear time reduction from the well-studied problem of finding all minimal sets among a given family to the problem of computing the SCCs. Only subquadratic time algorithms are known for the former problem. These results strongly suggest that the problem of computing the SCCs is harder in directed hypergraphs than in directed graphs.Comment: v1: 32 pages, 7 figures; v2: revised version, 34 pages, 7 figure

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL-Polytechnique

Worlds: Controlling the Scope of Side Effects

Author: A. Bergel
A. Bergel
A. Warth
A. Warth
B. Ford
B. Ford
C. Okasaki
D. Ingalls
D.P. Reed
D.R. Jefferson
E. Gamma
E. Tanter
G.F. Johnson
H.T. Kung
M. Denker
R. Hirschfeld
R.B. Smith
R.M. Balzer
S. Ducasse
S.I. Feldman
T. Hårris
V.K. Nandivada
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Crossref

PNNU: parallel nearest-neighbor units for learned dictionaries

Author: Kung H.T.
McDaniel Bradley
Teerapittayanon Surat
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2015
Field of study

We present a novel parallel approach, parallel nearest neigh- bor unit (PNNU), for finding the nearest member in a learned dictionary of high-dimensional features. This is a computation fundamental to machine learning and data analytics algorithms such as sparse coding for feature extraction. PNNU achieves high performance by using three techniques: (1) PNNU employs a novel fast table look up scheme to identify a small number of atoms as candidates from which the nearest neighbor of a query data vector can be found; (2) PNNU reduces computation cost by working with candidate atoms of reduced dimensionality; and (3) PNNU performs computations in parallel over multiple cores with low inter-core communication overheads. Based on e cient computation via techniques (1) and (2), technique (3) attains further speed up via parallel processing. We have implemented PNNU on multi-core ma- chines. We demonstrate its superior performance on three application tasks in signal processing and computer vision. For an action recognition task, PNNU achieves 41x overall performance gains on a 16-core compute server against a conventional serial implementation of nearest neighbor computation. Our PNNU software is available online as open source.Funded by Naval Postgraduate SchoolIntel CorporationAgreement no. N00244-15-0050 (NPS

Calhoun, Institutional Archive of the Naval Postgraduate School

Memory requirements for balanced computer architectures

Author: Kung H.T
Publication venue: Published by Elsevier Inc.
Publication date: 31/10/1985
Field of study

AbstractIn this paper, a processing element (PE) is characterized by its computation bandwidth, I/O bandwidth, and the size of its local memory. In carrying out a computation, a PE is said to be balanced if the computing time equals the I/O time. Consider a balanced PE for some computation. Suppose that the computation band-width of the PE is increased by a factor of α relative to its I/O bandwidth. Then when carrying out the same computation the PE will be imbalanced; i.e., it will have to wait for I/O. A standard method of avoiding this I/O bottleneck is to reduce the overall I/O requirement of the PE by increasing the size of its local memory. This paper addresses the question of by how much the PE's local memory must be enlarged in order to restore balance.The following results are shown: For matrix computations such as matrix multiplication and Gaussian elimination, the size of the local memory must be increased by a factor of α2. For computations such as relaxation on a k-dimensional grid, the local memory must be enlarged by a factor of αk. For some other computations such as the FFT and sorting, the increase is exponential; i.e., the size of the new memory must be the size of the original memory to the αth power. All these results indicate that to design a balanced PE, the size of its local memory must be increased much more rapidly than its computation bandwidth. This phenomenon seems to be common for many computations where an output may depend on a large subset of the inputs.Implications of these results for some parallel computer architectures are also discussed. One particular result is that to balance an array of p linearly connected PEs for performing matrix computations such as matrix multiplication and matrix triangularization, the size of each PE's local memory must grow linearly with p. Thus, the larger the array is, the larger each PE's local memory must be

Elsevier - Publisher Connector